Vector Spaces 



The idea of vectors dates back to the middle 1800's, but our current understanding of the concept 
waited until Peano's work in 1888. Even then it took many years to understand the importance and 
generality of the ideas involved. This one underlying idea can be used to describe the forces and 
accelerations in Newtonian mechanics and the potential functions of electromagnetism and the states 
of systems in quantum mechanics and the least-square fitting of experimental data and much more. 

6.1 The Underlying Idea 

What is a vector? 

If your answer is along the lines "something with magnitude and direction" then you have some- 
thing to unlearn. Maybe you heard this definition in a class that I taught. If so, I lied; sorry about 
that. At the very least I didn't tell the whole truth. Does an automobile have magnitude and direction? 
Does that make it a vector? 

The idea of a vector is far more general than the picture of a line with an arrowhead attached to 
its end. That special case is an important one, but it doesn't tell the whole story, and the whole story 
is one that unites many areas of mathematics. The short answer to the question of the first paragraph 
is 

A vector is an element of a vector space. 

Roughly speaking, a vector space is some set of things for which the operation of addition is 
defined and the operation of multiplication by a scalar is defined. You don't necessarily have to be able 
to multiply two vectors by each other or even to be able to define the length of a vector, though those 
are very useful operations and will show up in most of the interesting cases. You can add two cubic 
polynomials together: 

{2-3x + Ax^ - Ix^) + {-%-2x + llx"^ + 9x^) 

makes sense, resulting in a cubic polynomial. You can multiply such a polynomial by* 17 and it's still 
a cubic polynomial. The set of all cubic polynomials in x forms a vector space and the vectors are the 
individual cubic polynomials. 

The common example of directed line segments (arrows) in two or three dimensions fits this idea, 
because you can add such arrows by the parallelogram law and you can multiply them by numbers, 
changing their length (and reversing direction for negative numbers). 

Another, equally important example consists of all ordinary real-valued functions of a real variable: 
two such functions can be added to form a third one, and you can multiply a function by a number to 
get another function. The example of cubic polynomials above is then a special case of this one. 

A complete definition of a vector space requires pinning down these ideas and making them less 
vague. In the end, the way to do that is to express the definition as a set of axioms. From these axioms 
the general properties of vectors will follow. 

A vector space is a set whose elements are called "vectors" and such that there are two operations 
defined on them: you can add vectors to each other and you can multiply them by scalars (numbers). 
These operations must obey certain simple rules, the axioms for a vector space. 



* The physicist's canonical random number 
James Nearing, University of Miami 
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6.2 Axioms 

The precise definition of a vector space is given by listing a set of axioms. For this purpose, I'll denote 
vectors by arrows over a letter, and I'll denote scalars by Greek letters. These scalars will, for our 
purpose, be either real or complex numbers — it makes no difference which for now.* 

1 There is a function, addition of vectors, denoted +, so that Vi + V2 is another vector. 

2 There is a function, multiplication by scalars, denoted by juxtaposition, so that av is a vector. 

3 {vi + V2) + = Vi + {V2 + v^) (the associative law). 

4 There is a zero vector, so that for each v, v + O = v. 

5 There is an additive inverse for each vector, so that for each v, there is another vector v' so that 

v + v' = 6. 

6 The commutative law of addition holds: Vi+V2 = V2 +Vi. 

7 (a + I3)v = av + (3v. 

8 {a(3)v = a{(3v). 

9 a{vi + V2) = avi + av2- 
10 Iv = V. 

In axioms 1 and 2 I called these operations "functions." Is that the right use of the word? 
Yes. Without going into the precise definition of the word (see section 12.1), you know it means that 
you have one or more independent variables and you have a single output. Addition of vectors and 
multiplication by scalars certainly fit that idea. 

6.3 Examples of Vector Spaces 

Examples of sets satisfying these axioms abound: 

1 The usual picture of directed line segments in a plane, using the parallelogram law of addition. 

2 The set of real-valued functions of a real variable, defined on the domain [a < x < b]. Addition is 
defined pointwise. If fi and /2 are functions, then the value of the function /i + /2 at the point 
X is the number fi{x) + f2{x). That is, /i + /2 = /s means /3(a;) = fi{x) + f2{x). Similarly, 
multiplication by a scalar is defined as {af){x) = a{f{x)). Notice a small confusion of notation in 
this expression. The first multiplication, (a/), multiplies the scalar a by the vector /; the second 
multiplies the scalar a by the number f{x). 

3 Like example 2, but restricted to continuous functions. The one observation beyond the previous 
example is that the sum of two continuous functions is continuous. 

4 Like example 2, but restricted to bounded functions. The one observation beyond the previous 
example is that the sum of two bounded functions is bounded. 

5 The set of n-tuples of real numbers: (ai, 02, . . . , an) where addition and scalar multiplication are 
defined by 

(ai, . . . , On) + {bi, . . . ,bn) = {ai + bi,. . . ,an + bn) a{ai, . . . , On) = (aoi, . . . , aa^) 

6 The set of square-integrable real-valued functions of a real variable on the domain [a < x < b\. 

That is, restrict example two to those functions with j^dx \ f{x)\'^ < 00. Axiom 1 is the only one 
requiring more than a second to check. 

7 The set of solutions to the equation d'^(j)/dx'^ + d'^(f)/dy'^ = in any fixed domain. (Laplace's 
equation) 



For a nice introduction online see distance-ed.math.tamu.edu/Math640, chapter three. 
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8 Like example 5, but with n = oo. 

9 Like example 8, but each vector has only a finite number of non-zero entries. 

10 Like example 8, but restricting the set so that YIT I'^fcP < Again, only axiom one takes work. 

11 Like example 10, but the sum is \aj^\ < oo. 

12 Like example 10, but Y,T \(^k\^ < ^- (P^ 1) 

13 Like example 6, but jj^ dx\f{x)\P < oo. 

14 Any of examples 2-13, but make the scalars complex, and the functions complex valued. 

15 The set of all n x n matrices, with addition being defined element by element. 

16 The set of all polynomials with the obvious laws of addition and multiplication by scalars. 

17 Complex valued functions on the domain [a < x < b] with Ifi^)]"^ < oo. (Whatever this 
means. See problem 6.18) 

18 {O}, the space consisting of the zero vector alone. 

19 The set of all solutions to the equations describing small motions of the surface of a drumhead. 

20 The set of solutions of Maxwell's equations without charges or currents and with finite energy. 
That is, f[E^ + B^]d^x < oo. 

21 The set of all functions of a complex variable that are differentiable everywhere and satisfy 

J dxdye~^^~y^\f{z)\^ < oo, 

where z = x + iy. 

To verify that any of these is a vector space you have to run through the ten axioms, checking 
each one. (Actually, in a couple of pages there's a theorem that will greatly simplify this.) To see what 
is involved, take the first, most familiar example, arrows that all start at one point, the origin. I'll go 
through the details of each of the ten axioms to show that the process of checking is very simple. There 
are some cases for which this checking isn't so simple, but the difficulty is usually confined to verifying 
axiom one. 

The picture shows the definitions of addition of vectors and multiplication by scalars, the first two 
axioms. The commutative law, axiom 6, is clear, as the diagonal of the parallelogram doesn't depend 
on which side you're looking at. 




iA + B) + C A + {B + C) 

The associative law, axiom 3, is also illustrated in the picture. The zero vector, axiom 4, appears in 
this picture as just a point, the origin. 

The definition of multiplication by a scalar is that the length of the arrow is changed (or even 
reversed) by the factor given by the scalar. Axioms 7 and 8 are then simply the statement that the 
graphical interpretation of multiplication of numbers involves adding and multiplying their lengths. 
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^ a{A + B) 

Axioms 5 and 9 appear in this picture. 

Finally, axiom 10 is true because you leave the vector alone when you multiply it by one. 

This process looks almost too easy. Some of the axioms even look as though they are trivial and 
unnecessary. The last one for example: why do you have to assume that multiplication by one leaves 
the vector alone? For an answer, I will show an example of something that satisfies all of axioms one 
through nine but not the tenth. These processes, addition of vectors and multiplication by scalars, are 
functions. I could write "f{vi,V2)" instead of "?7i + V2" and write "g{cy.. v)" instead of "cxv" . The 
standard notation is just that — a common way to write a vector-valued function of two variables. I 
can define any function that I want and then see if it satisfies the required properties. 

On the set of arrows just above, redefine multiplication by a scalar (the function g of the 
preceding paragraph) to be the zero vector for all scalars and vectors. That is, av = O for all a and v. 
Look back and you see that this definition satisfies all the assumptions 1-9 but not 10. For example, 
9: a{vi + V2) = avi + av2 because both sides of the equation are the zero vector. This observation 
proves that the tenth axiom is independent of the others. If you could derive the tenth axiom from the 
first nine, then this example couldn't exist. This construction is of course not a vector space. 

Function Spaces 

Is example 2 a vector space? How can a function be a vector? This comes down to your understanding 
of the word "function." Is f{x) a function or is f{x) a number? Answer: it's a number. This is a 
confusion caused by the conventional notation for functions. We routinely call f{x) a function, but 
it is really the result of feeding the particular value, x, to the function / in order to get the number 
f{x). This confusion in notation is so ingrained that it's hard to change, though in more sophisticated 
mathematics books it is changed. 

In a better notation, the symbol / is the function, expressing the 
relation between all the possible inputs and their corresponding outputs. 
Then /(I), or /(tt), or f{x) are the results of feeding / the particular 
inputs, and the results are (at least for example 2) real numbers. Think 
of the function / as the whole graph relating input to output; the pair 
[x,f{x)) is then just one point on the graph. Adding two functions is 
adding their graphs. For a precise, set theoretic definition of the word 
function, see section 12.1. Reread the statement of example 2 in light of 
these comments. 

Special Function Space 

Go through another of the examples of vector spaces written above. Number 6, the square-integrable 
real-valued functions on the interval a < x <b. The single difFiculty here is the first axiom: is the sum 
of two square-integrable functions itself square-integrable? The other nine axioms are yours to check. 
Suppose that 

rb rb 

/ f{x)^dx< 00 and / g{x)'^ dx < 00. 

J a J a 

simply note the combination 




{f{x) + gix)Y + {f{x)-g{x)f 



2f{xf + 2g{xf 
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The integral of the right-hand side is by assumption finite, so the same must hold for the left side. 
This says that the sum (and difference) of two square-integrable functions is square-integrable. For this 
example then, it isn't very difficult to show that it satisfies the axioms for a vector space, but it requires 
more than just a glance. 

There are a few properties of vector spaces that seem to be missing. There is the somewhat odd 
notation v' for the additive inverse in axiom 5. Isn't that just —v? Isn't the zero vector simply the 
number zero times a vector? Yes in both cases, but these are theorems that follow easily from the ten 
axioms listed. See problem 6.20. I'll do part (a) of that exercise as an example here: 

Theorem: the vector O is unique. 
Proof: assume it is not, then there are two such vectors, Oi and O2. 
By [4], di + 02 = Oi {O2 is a zero vector) 
By [6], the left side is 62 + Oi 
By [4], this is O2 {Oi is a zero vector) 
Put these together and Oi = O2. 

Theorem: If a subset of a vector space is closed under addition and multiplication by scalars, 
then it is itself a vector space. This means that if you add two elements of this subset to each other 
they remain in the subset and multiplying any element of the subset by a scalar leaves it in the subset. 
It is a "subspace." 

Proof: the assumption of the theorem is that axioms 1 and 2 are satisfied as regards the subset. That 
axioms 3 through 10 hold follows because the elements of the subset inherit their properties from the 
larger vector space of which they are a part. Is this all there is to it? Not quite. Axioms 4 and 5 take 
a little more thought, and need the results of the problem 6.20, parts (b) and (d). 

6.4 Linear Independence 

A set of non-zero vectors is linearly dependent if one element of the set can be written as a linear 
combination of the others. The set is linearly independent if this cannot be done. 

Bases, Dimension, Components 

A basis for a vector space is a linearly independent set of vectors such that any vector in the space can 
be written as a linear combination of elements of this set. The dimension of the space is the number 
of elements in this basis. 

If you take the usual vector space of arrows that start from the origin and lie in a plane, the 
common basis is denoted j. If I propose a basis consisting of 



these will certainly span the space. Every vector can be written as a linear combination of them. They 
are however, redundant; the sum of all three is zero, so they aren't linearly independent and aren't a 
basis. If you use them as if they are a basis, the components of a given vector won't be unique. Maybe 
that's o.k. and you want to do it, but either be careful or look up the mathematical subject called 
"frames." 

Beginning with the most elementary problems in physics and mathematics, it is clear that the 
choice of an appropriate coordinate system can provide great computational advantages. In dealing 
with the usual two and three dimensional vectors it is useful to express an arbitrary vector as a sum of 
unit vectors. Similarly, the use of Fourier series for the analysis of functions is a very powerful tool in 
analysis. These two ideas are essentially the same thing when you look at them as aspects of vector 
spaces. 

If the elements of the basis are denoted Cj, and a vector a is 
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the numbers {a^} are called the components of a in the specified basis. Note that you don't have to 
talk about orthogonality or unit vectors or any other properties of the basis vectors save that they span 
the space and they're independent. 

Example 1 is the prototype for the subject, and the basis usually chosen is the one designated 
X, y, (and z for three dimensions). Another notation for this is i, j, k — I'll use x-y. In any case, the 
two (or three) arrows are at right angles to each other. 

In example 5, the simplest choice of basis is 

e 1 = ( 1 ... ) 
e 2 = ( 1 ... ) 

e; = (0 ... 1) (6.1) 

In example 6, if the domain of the functions is from — oo to +oo, a possible basis is the set of 
functions 

ipn{x) = x'^e'^ 

The major distinction between this and the previous cases is that the dimension here is infinite. There 
is a basis vector corresponding to each non-negative integer. It's not obvious that this is a basis, but 
it's true. 

If two vectors are equal to each other and you express them in the same basis, the corresponding 
components must be equal. 

^0^64 = ^646^ ai = bi for all i (6.2) 

i i 

Suppose you have the relation between two functions of time 

A ^ Btu + -ft = I3t (6.3) 

that is, that the two functions are the same, think of this in terms of vectors: on the vector space of 
polynomials in t a basis is 

Co = 1, e 1 = t, 62 = t^, etc. 
Translate the preceding equation into this notation. 

{A - Buj)eo + 761 = /3ei (6.4) 

For this to be valid the corresponding components must match: 

A-Blo = 0, and 7 = /3 



DifTerential Equations 

When you encounter differential equations such as 

d X , doc , _ d X 1 ,n dx q-i- . , 

m-f-r + b-r- + kx = 0, or 'y-r-^ + kr-r- + ae p^x = 0, (6.5) 
dt^ at dt-^ dt 

the sets of solutions to each of these equations form vector spaces. All you have to do is to check the 
axioms, and because of the theorem in section 6.3 you don't even have to do all of that. The solutions 
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are functions, and as such they are elements of the vector space of example 2. All you need to do now 
is to verify that the sum of two solutions is a solution and that a constant times a solution is a solution. 
That's what the phrase "linear, homogeneous" means. 
Another common differential equation is 

<P0 g 

This describes the motion of an undamped pendulum, and the set of its solutions do not form a vector 
space. The sum of two solutions is not a solution. 

The first of Eqs. (6.5) has two independent solutions, 

Xi{t) = e~"'^ cosu't, and X2{t) = e~'^^ sinuj't (6.6) 



where 7 = —h/2m and ^' = y ^ — This is from Eq. (4.8). Any solution of this differential 

equation is a linear combination of these functions, and I can restate that fact in the language of this 
chapter by saying that Xi and X2 form a basis for the vector space of solutions of the damped oscillator 
equation. It has dimension two. 

The second equation of the pair (6.5) is a third order differential equation, and as such you 
will need to specify three conditions to determine the solution and to determine all the three arbitrary 
constants. In other words, the dimension of the solution space of this equation is three. 

In chapter 4 on the subject of differential equations, one of the topics was simultaneous differential 
equations, coupled oscillations. The simultaneous differential equations, Eq. (4.45), are 

mi-j^ = -kiXi - k3{xi - X2), and m2-^ = -k2X2 - kz{x2 - Xi) 

and have solutions that are pairs of functions. In the development of section 4.10 (at least for the equal 
mass, symmetric case), I found four pairs of functions that satisfied the equations. Now translate that 
into the language of this chapter, using the notation of column matrices for the functions. The solution 
is the vector 

'xi(t)' 

X2{t) ^ 

and the four basis vectors for this four-dimensional vector space are 



Any solution of the differential equations is a linear combination of these. In the original notation, you 
have Eq. (4.52). In the current notation you have 



Xi 
X2 



Ai ei + A2 6*2 + A3 63 + A4 64 



6.5 Norms 

The "norm" or length of a vector is a particularly important type of function that can be defined on a 
vector space. It is a function, usually denoted by || ||, and that satisfies 

1- ll^^ll > 0; \\v\\ = if and only if -u = O 

2. \\av\\ = \a\ \\v\\ 

3. Il^i -|-'y2|| < + ||^'2|| ( the triangle inequality) The distance between two vectors Vi and 
V2 is taken to be \\vi — V2\\- 
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6.6 Scalar Product 

The scalar product of two vectors is a scalar valued function of two vector variables. It could be denoted 
as f{u,v), but a standard notation for it is (m, iT). It must satisfy the requirements 

1. (td, (u + v)) = (w,u) + (w,v) 

2. (w,av) = a{w,v) 

3. {u,v)* = {v,u) 

4. {v,v)>0; and (-u, -u ) = if and only if -u = O 

When a scalar product exists on a space, a norm naturally does too: 

\\v\\ = ^/(vy). (6.7) 

That this is a norm will follow from the Cauchy-Schwartz inequality. Not all norms come from scalar 
products. 

Examples 

Use the examples of section 6.3 to see what these are. The numbers here refer to the numbers of that 
section. 

1 A norm is the usual picture of the length of the line segment. A scalar product is the usual product 
of lengths times the cosine of the angle between the vectors. 

{u,v) = u-v = uv cos'd. (6.8) 

4 A norm can be taken as the least upper bound of the magnitude of the function. This is distinguished 
from the "maximum" in that the function may not actually achieve a maximum value. Since it is 
bounded however, there is an upper bound (many in fact) and we take the smallest of these as the 
norm. On — oo < x < +oo, the function |tan^^x| has 7r/2 for its least upper bound, though it 
never equals that number. 

5 A possible scalar product is 

n 

((ai, . . . , an), (&i, . . . , hn)) = XI 4 ^k- (6.9) 

k=l 

There are other scalar products for the same vector space, for example 

n 

((ai, . . . ,an), (6i, . . . ,&n)) = '^kalbf, (6.10) 

k=i 

In fact any other positive function can appear as the coefficient in the sum and it still defines a 
valid scalar product. It's surprising how often something like this happens in real situations. In 
studying normal modes of oscillation the masses of different particles will appear as coefficients in 
a natural scalar product. 

I used complex conjugation on the first factor here, but example 5 referred to real numbers only. 
The reason for leaving the conjugation in place is that when you jump to example 14 you want to 
allow for complex numbers, and it's harmless to put it in for the real case because in that instance 
it leaves the number alone. 
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For a norm, there are many possibilities: 

(1) ||(ai,.., 

(2) ||(ai,.., 

(3) ||(ai,.., 

(4) ||(ai,.., 



,an)\\ = max^^-^ \ak\ 



(6.11) 



The United States Postal Service prefers a variation on the second of these norms, see problem 8.45. 
6 A possible choice for a scalar product is 



if ^9) 



dxf{xyg{x). 



(6.12) 



9 Scalar products and norms used here are just like those used for example 5. The difference is that 
the sums go from 1 to infinity. The problem of convergence doesn't occur because there are only 
a finite number of non-zero terms. 
10 Take the norm to be 



||(ai,a2,...) 



(6.13) 



and this by assumption will converge. The natural scalar product is like that of example 5, but with 
the sum going out to infinity. It requires a small amount of proof to show that this will converge. 
See problem 6.19. 

11 A norm is = Yli^i I'^il- There is no scalar product that will produce this norm, a fact that 

you can prove by using the results of problem 6.13. 
13 A natural norm is 



dx\f{xW 



i/p 



(6.14) 



To demonstrate that this is a norm requires the use of some special inequalities found in advanced 
calculus books. 

15 If A and B are two matrices, a scalar product is (A, 5) = Tt:{A^B), where f is the transpose 
complex conjugate of the matrix and Tr means the trace, the sum of the diagonal elements. Several 

possible norms can occur. One is \\A\\ = \J Ti{A^ A). Another is the maximum value of ||y4-u ||, 

1 /2 

where m is a unit vector and the norm of u is taken to be + • • • + 

19 A valid definition of a norm for the motions of a drumhead is its total energy, kinetic plus potential. 
How do you describe this mathematically? It's something like 



dx dy 



1 



dl 
dt 



I've left out all the necessary constants, such as mass density of the drumhead and tension in the 
drumhead. You can perhaps use dimensional analysis to surmise where they go. 

There is an example in criminal law in which the distinctions between some of these norms have 
very practical consequences. If you're caught selling drugs in New York there is a longer sentence if your 
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sale is within 1000 feet of a school. If you are an attorney defending someone accused of this crime, 
which of the norms in Eq. (6.11) would you argue for? The legislators who wrote this law didn't know 
linear algebra, so they didn't specify which norm they intended. The prosecuting attorney argued for 
norm #1, "as the crow flies," but the defense argued that "crows don't sell drugs" and humans move 
along city streets, so norm #2 is more appropriate. 

The New York Court of Appeals decided that the Pythagorean norm (#1) is the appropriate one 
and they rejected the use of the pedestrian norm that the defendant advocated (#2). 
www.courts.state.ny.us/ctapps/decisions/ nov05/162opn05.pdf 

6.7 Bases and Scalar Products 

When there is a scalar product, a most useful type of basis is the orthonormal one, satisfying 

{^i^^j) = ^ij = [l (6-15) 

The notation 6ij represents the very useful Kronecker delta symbol. 

In the example of Eq. (6.1) the basis vectors are orthonormal with respect to the scalar product 
in Eq. (6.9). It is orthogonal with respect to the other scalar product mentioned there, but it is not in 
that case normalized to magnitude one. 

To see how the choice of even an orthonormal basis depends on the scalar product, try a different 
scalar product on this space. Take the special case of two dimensions. The vectors are now pairs of 
numbers. Think of the vectors as 2 x 1 matrix column and use the 2x2 matrix 

2 1 
^1 2 

Take the scalar product of two vectors to be 

((ai,a2),(6i,62)>= ^""^ (l ^ ) ) = 2a*6i + a*^ + + 2a;62 (6.16) 

To show that this satisfies all the defined requirements for a scalar product takes a small amount of 
labor. The vectors that you may expect to be orthogonal, (1 0) and (0 1), are not. 

In example 6, if we let the domain of the functions be —L < x < +L and the scalar product is 
as in Eq. (6.12), then the set of trigonometric functions can be used as a basis. 

riTTX , mvrx 

sin and cos — = — 

Ij Ij 

n = 1,2,3,... and m = 0, 1, 2, 3, . . . . 
That a function can be written as a series 

oo oo 

f [x) = an sm—— + 2_^bm COS— j— (6.17) 



on the domain —L < x < +L is just an example of Fourier series, and the components of / in this 
basis are Fourier coefficients ai, . . . ,6o) • • •■ An equally valid and more succinctly stated basis is 

^n^ix/L^ n = 0, ±1, ±2, ... 



Chapter 5 on Fourier series shows many other choices of bases, all orthogonal, but not necessarily 
normalized. 
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To emphasize the relationship between Fourier series and the ideas of 
vector spaces, this picture represents three out of the infinite number of basis 
vectors and part of a function that uses these vectors to form a Fourier series. 



/ (a;) = - sm — + - sm — + - sm — + 



sm 



3Trx 



f 



sm 



2TTX 



sm 



The orthogonality of the sines becomes the geometric term "perpendicular," and 
if you look at section 8.11, you will see that the subject of least square fitting 
of data to a sum of sine functions leads you right back to Fourier series, and to 
the same picture as here. 

6.8 Gram-Schmidt Orthogonalization 

From a basis that is not orthonormal, it is possible to construct one that is. This device is called the 
Gram-Schmidt procedure. Suppose that a basis is known (finite or infinite), Wi, V2i ■ ■ ■ 

Step 1: normalize Vi. e\ = Vi/ \J {vi, Vi). 

Step 2: construct a linear combination of tTi and V2 that is orthogonal to Vi: 
Let 6*20 = V2 — ei{ei,V2) and then normalize it. 



62 



620/(620,620)^'^^. 



(6.18) 



Step 3: Let 630 = — ei(ei, Vs) — 62(62, V3) etc. repeating step 2. 
What does this look like? See problem 6.3. 



6.9 Cauchy-Schwartz inequality 

For common three-dimensional vector geometry, it is obvious that for any real angle, cos^ 9 < 1. In 
terms of a dot product, this is |v4-i?| < AB. This can be generalized to any scalar product on any 
vector space: 



|(u, -u)! < ||-u 



(6.19) 



The proof starts from a simple but not-so-obvious point. The scalar product of a vector with itself is 
by definition positive, so for any two vectors u and v you have the inequality 



{u — Xv,u — Xv) > 0. 
where A is any complex number. This expands to 

{u,u) + \X\'^{v,v) - X{u,v) - X*{v,u) > 0. 



(6.20) 



(6.21) 



How much bigger than zero the left side is will depend on the parameter A. To find the smallest value 
that the left side can have you simply differentiate. Let X = x + iy and differentiate with respect to x 
and y, setting the results to zero. This gives (see problem 6.5) 



A = {v,u)/{v,v). 
Substitute this value into the above inequality (6.21) 

{u,u) + 









\{u,v)\'^ 




\{u,v) 








(v,v) 


(v,v) 


(v,v) 



> 0. 



(6.22) 



(6.23) 
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This becomes 



(6.24) 



This isn't quite the result needed, because Eq. (6.19) is written differently. It refers to a norm and I 
haven't established that the square root of {v,v) is a norm. When I do, then the square root of this 
is the desired inequality (6.19). 

For a couple of examples of this inequality, take specific scalar products. First the common 
directed line segments: 



{u,v) = u-v = uv COS 9, 



so 





2 






- rb 


/ dxf{xyg{x) 


< 


['dx\f{x)\' 




/ dx\g[x)\^ 


J a 




J a 




J a 



The first of these is familiar, but the second is not, though when you look at it from the general vector 
space viewpoint they are essentially the same. 

Norm from a Scalar Product 

The equation (6.7), \\v\\ = 



[y^v), defines a norm. Properties one and two for a norm are simple 

to check. (Do so.) The third requirement, the triangle inequality, takes a bit of work and uses the 
inequality Eq. (6.24). 

{Vi +V2,Vi + V2) = {vi,Vl) + {V2,V2) + {vi,V2) + {v2,Vi) 

< {vi,Vi) + {V2,V2) + |(^^1,^^2)| + |(^2,^^l)| 

= {vi,vi) + (1/2,^2) + 2|(t;i,t;2) I 

< {Vi, Vi) + {V2, V2) + 2^J {vi,Vi){v2,V2) 
\l{vi,Vi) + ^J{V2,V2) 



The first inequality is a property of complex numbers. The second one is Eq. (6.24). The square root 



of the last line is the triangle inequality, thereby justifying the use of \/[v,v) as the norm of v and in 
the process validating Eq. (6.19). 



I^^l +V2\\ = \J{VI +V2,Vi +V2) < \J {vi,Vi) + \J{V2,V2) = \\Vi\ 



+ IP2I 



(6.25) 



6.10 Infinite Dimensions 

Is there any real difference between the cases where the dimension of the vector space is finite and the 
cases where it's infinite? Yes. Most of the concepts are the same, but you have to watch out for the 
question of convergence. If the dimension is finite, then when you write a vector in terms of a basis 
V = Yli^k^k' the sum is finite and you don't even have to think about whether it converges or not. In 
the infinite-dimensional case you do. 

It is even possible to have such a series converge, but not to converge to a vector. If that sounds 
implausible, let me take an example from a slightly different context, ordinary rational numbers. These 
are the number m/n where m and n are integers [n ^ 0). Consider the sequence 



1, 14/10, 141/100, 1414/1000, 14142/10000, 141421/100000, 



6 — Vector Spaces 



13 



These are quotients of integers, but the limit is \/2 and that's not* a rational number. Within the 
confines of rational numbers, this sequence doesn't converge. You have to expand the context to get 
a limit. That context is the real numbers. The same thing happens with vectors when the dimension 
of the space is infinite — in order to find a limit you sometimes have to expand the context and to 
expand what you're willing to call a vector. 

Look at example 9 from section 6.3. These are sets of numbers (ai, 02, . . .) with just a finite 
number of non-zero entries. If you take a sequence of such vectors 

(1,0,0,...), (1,1,0,0,...), (1,1,1,0,0,...),... 

Each has a finite number of non-zero elements but the limit of the sequence does not. It isn't a vector 
in the original vector space. Can I expand to a larger vector space? Yes, just use example 8, allowing 
any number of non-zero elements. 

For a more useful example of the same kind, start with the same space and take the sequence 

(1,0,...), (1,1/2,0,...), (1,1/2,1/3,0,...),... 

Again the limit of such a sequence doesn't have a finite number of entries, but example 10 will hold 
such a limit, because ^^j" < 00. 

How do you know when you have a vector space without holes in it? That is, one in which these 
problems with limits don't occur? The answer lies in the idea of a Cauchy sequence. I'll start again 
with the rational numbers to demonstrate the idea. The sequence of numbers that led to the square 
root of two has the property that even though the elements of the sequence weren't approaching a 
rational number, the elements were getting close to each other. Let n = 1, 2, ... be a sequence 

of rational numbers. 

lim \rn—rm\=0 means 
n,m-^oo ^g_26) 

For any e > there is an N so that if both n and m are > then \rn — < e. 

This property defines the sequence rn as a Cauchy sequence. A sequence of rational numbers converges 
to a real number if and only if it is a Cauchy sequence; this is a theorem found in many advanced 
calculus texts. Still other texts will take a different approach and use the concept of a Cauchy sequence 
to construct the definition of the real numbers. 

The extension of this idea to infinite dimensional vector spaces requires simply that you replace 
the absolute value by a norm, so that a Cauchy sequence is defined by limn,m \\vn — Vm\\ = 0. A 
"complete" vector space is one in which every Cauchy sequence converges. A vector space that has 
a scalar product and that is also complete using the norm that this scalar product defines is called a 
Hilbert Space. 

I don't want to imply that the differences between finite and infinite dimensional vector spaces is 
just a technical matter of convergence. In infinite dimensions there is far more room to move around, 
and the possible structures that occur are vastly more involved than in the finite dimensional case. The 
subject of quantum mechanics has Hilbert Spaces at the foundation of its whole structure. 



* Proof: If it is, then express it in simplest form as m/n = \/2 = 2rP where m and n 

have no common factor. This equation implies that m must be even: m = 2mi. Substitute this value, 
giving 2m\ = . That in turn implies that n is even, and this contradicts the assumption that the 
original quotient was expressed without common factors. 
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Exercises 

1 Determine if these are vector spaces with the usual rules for addition and multiplication by scalars. 
If not, which axiom(s) do they violate? 

(a) Quadratic polynomials of the form ax^ + bx 

(b) Quadratic polynomials of the form ax'^ + bx + 1 

(c) Quadratic polynomials ax^ + bx + c with a + b + c = 

(d) Quadratic polynomials ax^ + bx + c with a + b + c = 1 

2 What is the dimension of the vector space of (up to) 5th degree polynomials having a double root 

at a; = 1? 

3 Starting from three dimensional vectors (the common directed line segments) and a single fixed 
vector B, is the set of all vectors v with v ■ B = a vector space? If so, what is it's dimension? 

Is the set of all vectors v with v x B = a vector space? If so, what is it's dimension? 

4 The set of all odd polynomials with the expected rules for addition and multiplication by scalars. Is 
it a vector space? 

5 The set of all polynomials where the function "addition" is defined to be fs = /2 + /1 if the number 

fsix) = fi{—x) + f2{—x). Is it a vector space? 

6 Same as the preceding, but for (a) even polynomials, (b) odd polynomials 

7 The set of directed line segments in the plane with the new rule for addition: add the vectors 
according to the usual rule then rotate the result by 10° counterclockwise. Which vector space axioms 
are obeyed and which not? 
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Problems 

6.1 Fourier series represents a choice of basis for functions on an interval. For suitably smooth functions 
on the interval to L, one basis is 




(6.27) 



Use the scalar product (/, g) = /g f*{x)g{x) dx and show that this is an orthogonal basis normalized 
to 1, i.e. it is orthonormal. 

6.2 A function F{x) = x{L — x) between zero and L. Use the basis of the preceding problem to write 
this vector in terms of its components: 

oo 

F = Y,oinen- (6.28) 
1 

If you take the result of using this basis and write the resulting function outside the interval < x < L, 
graph the result. 

6.3 For two dimensional real vectors with the usual parallelogram addition, interpret in pictures the 
first two steps of the Gram-Schmidt process, section 6.8. 

6.4 For two dimensional real vectors with the usual parallelogram addition, interpret the vectors u and 
V and the parameter A used in the proof of the Cauchy-Schwartz inequality in section 6.9. Start by 
considering the set of points in the plane formed by {u — \v] as A ranges over the set of reals. In 
particular, when A was picked to minimize the left side of the inequality (6.21), what do the vectors 
look like? Go through the proof and interpret it in the context of these pictures. State the idea of the 
whole proof geometrically. 

Note: I don't mean just copy the proof. Put the geometric interpretation into words. 

6.5 Start from Eq. (6.21) and show that the minimum value of the function of A = x + is given by 
the value stated there. Note: this derivation applies to complex vector spaces and scalar products, not 
just real ones. Is this a minimum? 

6.6 For the vectors in three dimensions, 

vi = X + y, V2=y + z, = 5 + x 

use the Gram-Schmidt procedure to construct an orthonormal basis starting from Vi. Ans: 63 = 

{x-y + z)/^/^ 

6.7 For the vector space of polynomials in x, use the scalar product defined as 

(/>^> = j ^dxf{xyg{x) 
(Everything is real here, so the complex conjugation won't matter.) Start from the vectors 

Vo = 1, Vi = X, V2 = x"^, V3 = X^ 
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and use the Gram-Schmidt procedure to construct an orthonormal basis starting from Vq. Compare 
these results to the results of section 4.11. [These polynomials appear in the study of electric potentials 
and in the study of angular momentum in quantum mechanics: Legendre polynomials.] 

6.8 Repeat the previous problem, but use a different scalar product: 



[These polynomials appear in the study of the harmonic oscillator in quantum mechanics and in the 
study of certain waves in the upper atmosphere. With a conventional normalization they are called 
Hermite polynomials.] 

6.9 Consider the set of all polynomials in x having degree < A^. Show that this is a vector space and 
find its dimension. 

6.10 Consider the set of all polynomials in x having degree < N and only even powers. Show that 
this is a vector space and find its dimension. What about odd powers only? 

6.11 Which of these are vector spaces? 

(a) all polynomials of degree 3 

(b) all polynomials of degree < 3 [Is there a difference between (a) and (b)?] 

(c) all functions such that /(I) = 2/(2) 

(d) all functions such that /(2) = /(I) + 1 

(e) all functions satisfying f{x + 2n) = f{x) 

(f) all positive functions 

(g) all polynomials of degree < 4 satisfying J^^ dxxf{x) = 0. 

(h) all polynomials of degree < 4 where the coefficient of x is zero. 
[Is there a difference between (g) and (h)?] 

6.12 (a) For the common picture of arrows in three dimensions, prove that the subset of vectors v 
that satisfy A-t; = for fixed A forms a vector space. Sketch it. 

(b) What if the requirement is that both A-v = ^ and B -v = hold. Describe this and sketch it. 

6.13 If a norm is defined in terms of a scalar product, H-u || = it satisfies the "parallelogram 
identity" (for real scalars), 



6.14 If a norm satisfies the parallelogram identity, then it comes from a scalar product. Again, assume 
real scalars. Consider combinations of Hm + 'W |p, Hm — {Tip and construct what ought to be the scalar 
product. You then have to prove the four properties of the scalar product as stated at the start of 
section 6.6. Numbers four and three are easy. Number one requires that you keep plugging away, using 
the parallelogram identity (four times by my count). 

Number two is downright tricky; leave it to the end. If you can prove it for integer and rational values 
of the constant a, consider it a job well done. I used induction at one point in the proof. The final 
step, extending a to all real values, requires some arguments about limits, and is typically the sort of 
reasoning you will see in an advanced calculus or mathematical analysis course. 




u + v\\'^ + \\u-v\\'^ = 2\\u\\'^ + 2||'i7 



(6.29) 
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6.15 Modify the example number 2 of section 6.3 so that /s = /i + /2 means f^ix) = fi{x — a) + 
f2{x — h) for fixed a and h. Is this still a vector space? 

6.16 The scalar product you use depends on the problem you're solving. The fundamental equation 
(5.15) started from the equation u" = Xu and resulted in the scalar product 

(ti2,Ml) = / dxU2{x)*Ui{x) 
J a 

Start instead from the equation u" = Xw{x)u and see what identity like that of Eq. (5.15) you come 
to. Assume w is real. What happens if it isn't? In order to have a legitimate scalar product in the 
sense of section 6.6, what other requirements must you make about w7 

6.17 The equation describing the motion of a string that is oscillating with frequency lj about its 
stretched equilibrium position is 

^T{x&)=-u;'^^{x)y 



dx \ dx ^ 

Here, y{x) is the sideways displacement of the string from zero; T{x) is the tension in the string (not 
necessarily a constant); jj,{x) is the linear mass density of the string (again, it need not be a constant). 
The time-dependent motion is really y{x) cos(a;t + (p), but the time dependence does not concern us 
here. As in the preceding problem, derive the analog of Eq. (5.15) for this equation. For the analog 
of Eq. (5.16) state the boundary conditions needed on y and deduce the corresponding orthogonality 
equation. This scalar product has the mass density for a weight. 

Ans: [T{x){y[yl - Z/iZ/aOla = - ^i) fa f^{x)y*2yi dx 
6.18 The way to define the sum in example 17 is 

\f(x)\^ = lim{the sum of |/(x)P for those x where |/(a;)P > c> 0}. (6.30) 

X 

This makes sense only if for each c > 0, \ f{x)\'^ is greater than c for just a finite number of values of 
X. Show that the function 

l/n forx = l/n 
otherwise 



fix) 



is in this vector space, and that the function f{x) = a; is not. What is a basis for this space? [Take 
< X < 1] This is an example of a vector space with non-countable dimension. 



6.19 In example 10, it is assumed that I'^fcP < o^- Show that this implies that the sum used 



for the scalar product also converges: X]i°'^I^fc- [Consider the sums ^ |ajt + zfej^P, X] ~ ^^fcP 



^ |ait + , and ^ ja^ — 6^1 , allowing complex scalars.] 

6.20 Prove strictly from the axioms for a vector space the following four theorems. Each step in your 
proof must explicitly follow from one of the vector space axioms or from a property of scalars or from 
a previously proved theorem. 

(a) The vector O is unique. [Assume that there are two, Oi and O2. Show that they're equal. First 
step: use axiom 4.] 
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(b) The number times any vector is the zero vector: Ov = O. 

(c) The vector v' is unique. 

(d) {-l)v = v'. 

6.21 For the vector space of polynomials, are the two functions {1 + x + x^} linearly independent? 

6.22 Find the dimension of the space of functions that are linear combinations of 

{1, sinx, cosx, sin^x, cos^x, sin^x, cos^x, sin^xcos^x} 




6.23 A model vector space is formed by drawing equidistant parallel lines in a plane and labelling 
adjacent lines by successive integers from oo to +oo. Define multiplication by a (real) scalar so that 
multiplication of the vector by a means multiply the distance between the lines by l/a. Define 
addition of two vectors by finding the intersections of the lines and connecting opposite corners of the 
parallelograms to form another set of parallel lines. The resulting lines are labelled as the sum of the 
two integers from the intersecting lines. (There are two choices here, if one is addition, what is the 
other?) Show that this construction satisfies all the requirements for a vector space. Just as a directed 
line segment is a good way to picture velocity, this construction is a good way to picture the gradient 
of a function. In the vector space of directed line segments, you pin the vectors down so that they all 
start from a single point. Here, you pin them down so that the lines labeled "zero" all pass through a 
fixed point. Did I define how to multiply by a negative scalar? If not, then you should. This picture of 
vectors is developed extensively in the text "Gravitation" by Misner, Wheeler, and Thorne. 

6.24 In problem 6.11 (g), find a basis for the space. Ans: 1, x, 3x — 5x^. 

6.25 What is the dimension of the set of polynomials of degree less than or equal to 10 and with a 
triple root at x = 1? 

6.26 Verify that Eq. (6.16) does satisfy the requirements for a scalar product. 

6.27 A variation on problem 6.15: /s = /i + /2 means 

(a) fsix) = Afi{x — a) + Bf2{x — h) for fixed a, b, A, B. For what values of these constants is this 
a vector space? 

(b) Now what about /3(x) = /i(x^) + /2(x^)? 

6.28 Determine if these are vector spaces: 

(1) Pairs of numbers with addition defined as (xi, X2) + (1/1,^/2) = {xi +1/2, X2 + yi) and multiplication 
by scalars as c(xi,X2) = (0x1,0x2). 

(2) Like example 2 of section 6.3, but restricted to those / such that /(x) > 0. (real scalars) 

(3) Like the preceding line, but define addition as (/ + g){x) = f{x)g{x) and {cf){x) = (/(x))'^. 
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6.29 Do the same calculation as in problem 6.7, but use the scalar product 

{f,9) = [ x^dxr{x)g{x) 
Jo 



6.30 Show that the following is a scalar product. 

{f,g) = tdx [nx)g{x) + \r{x)g'{x)] 

J a 

where A is a constant. What restrictions if any must you place on A? The name Sobolev is associated 
with this scalar product. 

6.31 (a) With the scalar product of problem 6.29, find the angle between the vectors 1 and x. Here 
the word angle appears in the sense o\ A - B = AB cos 6. (b) What is the angle if you use the scalar 
product of problem 6.7? (c) With the first of these scalar products, what combination of 1 and x is 
orthogonal to 1? Ans: 14.48° 

6.32 In the online text linked on the second page of this chapter, you will find that section two of 
chapter three has enough additional problems to keep you happy. 

6.33 Show that the sequence of rational numbers an = X]fe=i l/^ is not a Cauchy sequence. What 
about ELi 

6.34 In the vector space of polynomials of the form ax + /3a;^, use the scalar product {f,g) = 



Jq dx f{x)*g{x) and construct an orthogonal basis for this space. Ans: One pair is 



6.35 You can construct the Chebyshev polynomials by starting from the successive powers, x", n 
0, 1, 2, . . . and applying the Gram-Schmidt process. The scalar product in this case is 

' f{xTg{x) 



X 



2 



The conventional normalization for these polynomials is Tn(l) = 1, so you should not try to make the 
norm of the resulting vectors one. Construct the first four of these polynomials, and show that these 
satisfy Tn{cos9) = cos{n6). These polynomials are used in numerical analysis because they have the 
property that they oscillate uniformly between —1 and +1 on the domain — 1 < a; < 1. Verify that your 
results for the first four polynomials satisfy the recurrence relation: Tn+i{x) = 2xTn{x) — T^_i(x). 
Also show that cos ((n + 1)6') = 2 cos 6 cos (n6) — cos [{n — 1)6) . 

6.36 In spherical coordinates {0,(f)), the angle 9 is measured from the 2-axis, and the function 
fi{6,(f)) = cos9 can be written in terms of rectangular coordinates as (section 8.8) 

f^{e,^) = cos9 = - = , ^ 

Pick up the function f\ and rotate it by 90° counterclockwise about the positive y-axis. Do this rotation 
in terms of rectangular coordinates, but express the result in terms of spherical coordinates: sines and 
cosines of 6' and 0. Call it /2. Draw a picture and figure out where the original and the rotated function 
are positive and negative and zero. 
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Now pick up the same /i and rotate it by 90° clockwise about the positive x-axis, again finally expressing 
the result in terms of spherical coordinates. Call it f^. 

If now you take the original /i and rotate it about some random axis by some random angle, show that 
the resulting function is a linear combination of the three functions /i, /2, and /s. I.e., all these 
possible rotated functions form a three dimensional vector space. Again, calculations such as these are 
much easier to demonstrate in rectangular coordinates. 

6.37 Take the functions fi, /2, and /a from the preceding problem and sketch the shape of the 
functions 

r e-7i(^, 0), r e'^Me, 0), r e-'MO, 0) 

To sketch these, picture them as defining some sort of density in space, ignoring the fact that they are 
sometimes negative. You can just take the absolute value or the square in order to visualize where they 
are big or small. Use dark and light shading to picture where the functions are big and small. Start by 
finding where they have the largest and smallest magnitudes. See if you can find similar pictures in an 
introductory chemistry text. Alternately, check out winter.group.shef.ac.uk/orbitron/ 

6.38 Use the results of problem 6.17 and apply it to the Legendre equation Eq. (4.55) to demonstrate 
that the Legendre polynomials obey J^^dx Pn{x)Pm{x) = if n 7^ m. Note: the function T{x) 
from problem 6.17 is zero at these endpoints. That does not imply that there are no conditions on 
the functions yi and 1/2 at those endpoints. The product of T{x)y'iy2 has to vanish there. Use the 
result stated just after Eq. (4.59) to show that only the Legendre polynomials and not the more general 
solutions of Eq. (4.58) work. 

6.39 Using the result of the preceding problem that the Legendre polynomials are orthogonal, show 
that the equation (4.62)(a) follows from Eq. (4.62)(e). Square that equation (e) and integrate J^^dx. 
Do the integral on the left and then expand the result in an infinite series in t. On the right you have 
integrals of products of Legendre polynomials, and only the squared terms are non-zero. Equate like 
powers oft and you will have the result. 

6.40 Use the scalar product of Eq. (6.16) and construct an orthogonal basis using the Gram-Schmidt 
process and starting from ^""^ (l) Verify that your answer works in at least one special case. 

6.41 For the differential equation x + x = 0, pick a set of indejsendent solutions to the differential 
equation — any ones you like. Use the scalar product {f,g) = Jq dx f{x)*g{x) and apply the Gram- 
Schmidt method to find an orthogonal basis in this space of solutions. Is there another scalar product 
that would make this analysis simpler? Sketch the orthogonal functions that you found. 



